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EXTREME EIGENVALUES OF SPARSE, HEAVY TAILED 
RANDOM MATRICES 

ANTONIO AUFFINGER AND SI TANG 


Abstract. We study the statistics of the largest eigenvalues of p x p sample covari¬ 
ance matrices when the entries of the pxn matrix are sparse 

and have a distribution with tail t~°‘, a > 0. On average the number of nonzero 
entries of Mp,„ is of order 0 < p < 1. We prove that in the large n limit, the 

largest eigenvalues are Poissonian if q < 2(1 -|- and converge to a constant in 
the case a > 2(1 -|- We also extend the results of [7] in the Hermitian case, 

removing restrictions on the number of nonzero entries of the matrix. 


1. Introduction 


We study the statistics of the largest eigenvalues of sample covariance matrices when 
the entries are heavy tailed and sparse. Let x be a complex-valued random variable. 
We say x has a heavy tailed distribution with parameter a if the (two-sided) tail 
probability 

Gait) ■= P(|x| > t) = t > 0 

where a > 0 and L is a slowly varying function, i.e., 


lim^ 

t —^CXD 


= L 


Vs > 0. 


For each n > 1, let y = y(n) be a Bernoulli random variable, independent of x, with 
P(y = 1) = = 1 — P(y = 0), where 0 < /i < 1 is a constant. The ensemble of 

random sample covariance matrices that we study here is defined as follows. For each 
n > 1, let p = pin) € be a function of n such that 


p/n ^ p, 0</9<l, 

as n ^ oo. Let Ap^n = be p x n random matrices 

whose entries are i.i.d. copies of x and y, respectively. Form the pxn matrix = 
Ap^n ■ Bp^n = [mj]ij=i by setting m-ij = aijbij. Then 


Llp,n AIp^nMp^n 

is the p X p sparse, heavy tailed random sample covariance matrix with parameters a 
and p. Note that is positive semi-definite so all its eigenvalues are non-negative. 
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The extreme eigenvalues of are the main subject of this paper. We will see 
that, depending on the tail exponent a and the sparsity exponent /r, when properly 
rescaled, the top eigenvalues will either converge to a Poisson point process or to the 
right edge of the Marchenko-Pastur law. 

To put our theorems in context, we briefly review past results. The study of extreme 
eigenvalues of heavy tailed random matrices started with the work of Soshnikov. In 
[26], he proved that if 0 < a < 2, the asymptotic behavior of the top eigenvalues of 
a heavy tailed Hermitian matrix is determined by the behavior of the largest entries 
of the matrix, i.e., the point process of the largest eigenvalues (properly normalized) 
converges to a Poisson point process, as in the usual extreme value theory for i.i.d. 
random variables. This result was extended to sample covariance matrices and for all 
values of a G (0,4) in the work of Auffinger, Ben Arous and Peche [2]. The upper 
bound on the tail exponent a is optimal as for entries with finite fourth moment, the 
largest eigenvalues converge to the right edge of the bulk distribution and have Tracy- 
Widom fluctuations [4, 5, 17, 28]. Eigenvector localization and delocalization were 
studied in [6]. In the physics literature, many of these results were predicted in the 
seminal paper of Bouchaud and Cizeau [13]. 

The largest eigenvalues of sparse Hermitian random matrices with bounded moments 
were investigated by Benaych-Georges and Peche [8] under the assumptions of at least 
a;(logn) nonzero entries in each row. They extended the results of [16, 25], establishing 
the convergence of the largest eigenvalue to the edge and also obtained results on 
localization/delocalization of eigenvectors. For bulk statistics in the sparse setting, 
readers are invited to see Erdos, Knowles, Yau, and Yin [14] and the references therein. 

In [7], Benaych-Georges and Peche considered a class of n x n Hermitian, heavy 
tailed, sparse matrices. In their work, the authors looked at matrices, where in n — o{n) 
rows, the number of nonzero entries was asymptotically equal to for /r G (0,1]. For 
the remaining o{n) rows, the number of nonzero entries was no more than n^. This 
assumption is well-suited to treat the case of heavy-tailed band matrices. In the last 
section, we will extend the work of [7] by removing all restrictions on the number 
of nonzero entries in each row, allowing, for instance, the sparsity to come from the 
adjacency matrix of an Erdds-Renyi random graph. 

Although we extend the results of [7], the main objective of this paper is to treat the 
spectrum of sample covariance matrices constructed from a sparse matrix Mp^„. 
These matrices naturally appear in applications such as models of complex networks 
with two species of nodes [19] and also in information theory as channel capacity of 
wideband GDMA schemes [29]. For more applications and predictions one can look at 
[18, 20, 23, 24] and the references therein. In the mathematical literature, as far as we 
know, there are no results dealing with the top eigenvalues of sparse sample covariance 
matrices. The main purpose of this paper is to provide such results. 

Throughout the paper, we will use Xi{A) to denote the l-th. largest eigenvalue of 
a Hermitian matrix A, vi(A) the corresponding eigenvector. For a matrix A = [aij], 
either Hermitian or rectangular, a^j^ denotes its the l-th. largest entry in absolute value 
in the upper-triangular part (if A is Hermitian) or of all entries (if A is rectangular), and 
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Oi{A) be its argument, i.e., Oi{A) = arg(aj,jJ. Let represent the canonical 

basis vectors for M”. The notation f{x) ~ g{x) means that there exists some slowly 
varying function l{x) such that f{x) = l{x)g{x). A sequence of events {En)n>i is 
said to occur with exponentially high probability (w.e.h.p.) if there exists C, 0 > 0 and 
no € N such that for n > no, IP(T'n) > 1 — e~^^^. We will also use the following matrix 
norms: 

||A||oo := maxlajjl, ||A||i := maxY^ |aij|, ||^|| := max ||Av||2. 

* 3 v: V 2 = 1 

3 * 

The rest of the sections will be organized as follows. In Section 2, we state our main 
results. In Section 3, a few key lemmas will be listed and proved. Section 4 will be 
devoted to the proof of the main theorems while in Section 5 we present the Hermitian 
case and other extensions. 


2. Main results 


Our main results are about the eigenvalues of the sample covariance matrix = 
Mp^n^Ip n- In our setting, there are approximately p-n - ~ nonzero entries 

in Mp^n- We know from extreme value theory [22, Section 1.2] that the scaling factor 
for the largest entries of the matrix Mp^n should be 

(1) Cnp := inf !^tGait) < 

Moreover, Cnp ~ and 

lim P ( max c~^\mij\ < t 

n^oo \ ij ^ 

Our first theorem says that when 0 < a < 2(1 + the extreme eigenvalues of 

Tjp^n behave like the square of the top entries of 



Theorem 2.1. S'uppose 0 < a < 2(1 + /X ^). For(l + /r ^)<a<2(l + /r ^), we also 
assume that x is centered. Then as n —>■ oo, we have for each I > 1 

A;(L!p,n) _P ^ 

\^il3l\^ 

p 

The eigenvectors are localized: for each I > 1, ||v;(Sp^„) — ejj |2 —> 0. 


It follows from Theorem 2.1 and a routine computation (see [2, Page 593]) that the 
random point processes 


( 2 ) 


p n p 

2=1 j = l 2 = 1 


converge in distribution to the same Poisson point process on (0, +oo) with inten¬ 
sity 
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Remark 1. As mentioned in the introduction, the conclusion of the Theorem above 
holds in the non-sparse case if and only if 0 < a < 4. Roughly speaking, when 
we introduce sparseness, we increase the localization of the eigenvectors towards the 
position of the largest entry and the Poissonian limit holds with lighter tails (2(1 + 
> 4). Note that, when /r = 0, any polynomial tail is allowed. This was also 
observed in [7], see Section 5 below for more details. One should also note that although 
Mp^n is sparse, is, in general, not. 

In the second regime, a > 2(1 + the Poissonian limit no longer holds. The 

largest eigenvalues, when normalized by n^, converge to the edge of the bulk distribu¬ 
tion. We also need the following definition. For L G N and r] G (0,1], we say that a 
unit vector v = (ui, ..., Vn) G C” is (L, r/)-localized if there exists a set S' C {1,... n} 
with cardinality L such that 

>1-7?. 

j&s 


Theorem 2.2. Suppose a > 2(1 + p, and x has mean zero and variance one. Then 
for each 1>1, as n ^ oo, we have 


A7(S 


'p,nj 




(1 + \/7>y 


The eigenvectors of are delocalized, namely, there exists /3, r?o > 0 such that for 
each I > 1, Q < rj < rjQ we have 


( 3 ) 


P^v;(Ep^„) is {\jy\,r])-localized^ —>• 0 


as n ^ oo. 


Remark 2. In the regime of both Theorems 2.2 and 5.2 below, when /r = 0, the critical 
case of a Erdos-Renyi adjacency matrix, we are forced to take a = oo, which is not 
allowed. In this case, it is still an open question to obtain explicit formulas for the 
limiting spectral distribution. To see more in this direction, the reader is invited to 
check [14] and the references therein. 

Remark 3. The form of delocalization in (3) is relatively simple compared to the results 
obtained in [6, 12] when considering non-heavy tailed distributions for Wigner matrices. 
In words, (3) says that if a > 2(1 -|- eigenvectors must have nonzero coordinates 
spread over at least p^ coordinates, different from the case a < 2{1 + /j~^) where the 
number of nonzero coordinates does not diverge with n. 

Remark 4. One can also take P(y = 1) = f{n)n^~^ for a slowly varying function 
/ 0. The results of the above theorems remain true, with an additional slowly 
varying function in the normalization of the entries. 

Remark 5. Our results also hold if we deterministically specify the positions of the 
nonzero entries in Mp^n or if the number of nonzero entries in each row is nonrandom 

and ~ n^. See Remark 6 in Section 3. 
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3. Some useful lemmas 

In this section we collect some tools and lemmas that will be used throughout the 
proofs of the main results. 

3.1. Results on the magnitudes of entries. 

Lemma 3 . 1 . Suppose Mp^n is the p x n rectangular, sparse, heavy tailed matrix. Let 
Cnp be as given in (1). Then, for all values of a > 0 and rj > 0, we have: 

(a) P < 7^ J 2 ,1 < * < P, 1 < ji,j 2 < n : min(|mjjJ, \mij^\) > ^ -)■ 0. 

(b) P < 3j,3ii 7^ i2,1 < n,^2 < P, 1 < j < R : \'mi2j\) > 0. 

Proof. Since Oij has a heavy tailed distribution as given in (1), then 

P(|ay| >t®) V0>O. 


Hence for the sparse matrix we have 

The proof for will follow from this fact and a union bound. Precisely, the left side of 
(a) is bounded by 

pn^ ^ 0 . 

where we used the fact that pjn ^ p. The proof of (b) is similar. 


□ 


3.2. Results on the sum of entries within rows and columns. The following 
lemma will be used to control the sum of absolute values within a given row or a given 
column of Mp^n- 

Lemma 3.2. Let Mp^n be the sparse, heavy tailed, rectangular random matrix. 

(a) For any sequence /3n ~ n^, where 0 < 6 < and Ve > 0, then w.e.h.p., 

n 

(4) X] 

i=i 

where (1 — a)'^ := max{l — a, 0}. 

(b) Ifa>l and p > 0, then for any sequences an ~ and /3„ ~ with 0 < a < b < 
and for any e > 0, we have, w.e.h.p.. 
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(c) If fi > 0, then for any sequences an ~ TT-a and fin ~ with r],r]' > 0, and 

any e> arj + r]', we have, w.e.h.p., 

n 

i=i 

(d) For any sequences On ~ and fin ~ with r],b > 0, and any 'y > b, we 

have w.e.h.p., 

n 

i=i 

Moreover, we have the same bounds for the column sums, that is, the same results in 
(a)-(d) hold if we replace part. 

This Lemma is modified from [7, Proposition A.6]. The main difference is that 

si 

in [7, Proposition A.6], because each row has asymptotically ~ nonzero entries 
(nonrandom), the summation in each part ran through only the ~ nonzero terms, 
whereas in our setting, the number of nonzero terms in each row and column and their 
positions are random and hence we include every term in a row or column. However, the 
proof strategy is similar and relies on the following consequence of Bennett’s inequality 
[9] (see also [7, Lemma A.7]). 


Lemma 3.3. For each n > 1, let Xj,... ,Xm be independent Bernoulli random vari¬ 
ables with parameter p, where m and p depend on the parameter n. If mp > Cn^ for 
some constants C, 0 > 0, then for any rj > 0, w.e.h.p., 


1 

m 


Y^Xi-p 

2 = 1 


< rjp. 


Proof of Lemma 3.2. 

(a) Assume first that /a > 0. Choose eo G (0, e) such that b/eo 0 Z. Let T = [6/eoJ, 
then Tcq < b. Hence we may choose 

^ ^ H-aTep n-ab ^ n - yi ^ ^ 

2 2 “ 2 

For each k = 0,1,... ,T, define := ff{\mij\ : < \mij\ < The 

summation on the left side of (4) is bounded by 

n T 

Y kT|l{o<|m,,|<i} + = (I) + (H). 

j=l k=0 

Note that (I) is bounded by by Lemma 3.3, we have, w.e.h.p., 

n 

(I) ^ 

i=i 


(5) 


< 2n^ = o(re^+^). 
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Moreover, < Ej=i l|, where each l|^feeo<|mij|} independent 

copy of a Ber(L(n^'^“)re^“^“"^'^°) random variable. Since fi — ake^ > 9 > 0 for all 
A; = 0,..., T, we know that w.e.h.p., by Lemma 3.3, 

< 2L(n^"o)n'"““''"o. 

Thus, w.e.h.p., (II) is no more than 

T T 

L(n^"o) = 2L(n^"°)n'"+"o ^ 

k=0 k=0 

If 0 < a < I, then 7 t,(i-“)*:«o < j|^(i-o)'reo_ h> Teq and eo < e, we have 

(II) < 2L(n^"o)n^+"°(r + 

(6) < 2L(n^^o)(T + i)n/^+^o+(i-")'> < ^M+b(i-«)+^ 

If a > 1, then Using again cq < e, we obtain, 

(7) (II) < 2L(n*^"°)re^+"o • (T + 1) < n^+L 

Combining (5), (6) and (7), we prove Part (a) for > 0. In the case /x = 0, then 
6 = 0 and the number Vn of nonzero terms in the sum X]o<|m I converges in 
distribution to a Poisson random variable of mean 1. It suffices then to bound the 
sum in (4) by f^nVn to obtain the desired result. 

(b) Just like in Part (a), we choose cq < e such that 0 Z, and take T = [^^]. 

Set 9 = > 0. Define 

y/^^ := #{\mij\ : ann'"^° < \mij\ < 

Then, for each k, y/^^ is bounded by #{\mij\ : ann^'^° < |7n,ij|}, which is dis¬ 
tributed as a Bin(n, n^“^L(Q!nU*^'^°)(Q!nn.^'^°)““) random variable. Again, for each 
k<T, 

n ■ n^-^L(a„n^^o)(a,,n^^o)-" ~ n^^-{a+keo)a ^ ^ 


From Lemma 3.3, w.e.h.p., Y) < 2n^ a{a+keo)+s^ £qj, arbitrarily small 6 > 0. 
Then, we have w.e.h.p., 


i=i 


T T 

U.(^)q/^77,(^+1)^0 < 9 \ '' Yik'~<^{o,+keo)+S^a+{k+l)eo+S' 
k=0 k=0 


Y1 l"^b|lw<|mi,|</3„} < < 2^n- 


T 

< n^+'^0-l-<5"-a(a-l) ^^fceo(l-o) < -a{a-l) ^ ^^i-a{a-l)+e 

k=0 
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(c) For any 5,5' > 0, the left side is no larger than Ir 

where i} ~Bin(n, for some slowly vary¬ 

ing function L'. Since 

n • , for 6 = ari/2 > 0 , 

then w.e.h.p., 1 , e-„-s , n < 2L'(n)n“(’'+^\ But (5, 5' > 0 can be chosen 

— ^ {na 'I ' 

arbitrarily small, so as long as e > arj + rj', w.e.h.p., the left side is bounded by 
n^+v'+S' . 2L'(n)n"(’?+'^) = n-+“^+’'' • (2L'(n)n“^+^') < n-+^ 

(d) We compute this probability directly. Write Si := J2]=i l"ib|l{Qn<|mul</3n}- 

For any 7 > 6 , and choose e, e' > 0 sufficiently small (e.g., e < ar]f2, e' < (7 —6)/2), 

P(5i > nS) < P(at least [nS//3n\ terms are nonzero among the n rriij’s) 

□ 

3.3. An upper bound on the trace. We now prove an upper bound for the norm 
of the truncated sample covariance matrix, as given below. 

Theorem 3.4. Suppose a > 2 and x has mean zero and variance one. Let Mp^n = 
Ap,n • Bp^n be the sparse pxn matrix with heavy tailed entries. Consider 7 , 7 ' > 0 such 
that 7 ' > 7 and 7 ' > ^. Define the truncated matrix 

Mp,n = 

We also assume that the truncated entries are centered. Then for any k > 1, 

¥(^\\Mp^nM;j > ^ 0 . 

Proof. For k > 1 given, we find C G (1, and let En be the event 

En = {l< Cn>', L < Cpn^-^}, 

where L (resp. L) is the maximum number of nonzero entries in a row (resp. a column) 
among the p rows (resp. n columns) of Mp^n- We break the desired probability into 
two parts: 

p(^||Mp,„Mp*J| > Kn^^'{l + ^f,En^ +f>{\\Mp^nMlfi\ > {I + , Ef}j . 

Since the second term is bounded by ¥’{E^) which vanishes as n —t- 00 by Chernoff 
inequality, it suffices to prove that the first term vanishes as well. To do this, we 
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choose 7 " > 0 such that 7 ' > 7 + 67 " and set k = kn = • We will prove that for 

any <5 > 0 small, 

( 8 ) E(Tr(Mp,„Mp%)'^lsJ < P{n) [(1 + VdfCn^^'il + 


where P(n) is some polynomial of n. It will then follow from ( 8 ) that 




^fcj^27'jfc(l _j_ 


< P{n) 


C{l + VSf 


The right side goes to zero as re —>■ 00 , if we choose <5 > 0 such that C(1 + \/h)^ < k. 
To show ( 8 ), we will make use of the combinatorics that was invented in [28] to prove 
the convergence of the largest eigenvalue of random sample covariance matrices. 

We hrst expand the left side of ( 8 ): 


p n 

E(Tt(M„.M7)UsJ = ^ ^ E(?rejjj2^i3i2 ■ ■ ■ '^i2k-l'>-2k^ili2k^ En) ■ 

h,*3,...,*2fc-l = l *2,*4v,*2fe = l 


Then, we associate each summand on the right side with an undirected graph Gi that 
has vertices {ii,... ,Z 2 fc} and edges {(^ 1 ,^ 2 ), (« 2 ,« 3 ), ■ ■ ■, (* 2 fc-i,* 2 fc), (* 2 fc,n)}- We read 
the vertices sequentially, ii,i 2 ,i 3 , ■ ■ ■ P 2 k, one at a time, and classify the edges into four 
different types. We call an edge (is_i,is),s > 2, an innovation if is does not occur in 
ii, ..., is-i- An innovation {ig-iiis) is called a row innovation if s is odd and a column 
innovation if s is even. If two edges {ia-i-,ia) and 4) have the same set of vertices, 
i.e., {ia-i,ia} = {ib-iPb}, we say that they coincide. And an edge {ia-i,ia) is said 
to be single up to %, with 6 > a, if there is no other edge {ic-i,ic) with 2 < c < b 
that coincides with {ia-i,ia)- For 6 > 3, we call {ib-i,ib) a T^-edge if there is an 
innovation {ia-i,ia),a < b, that is single up to ib-i and coincides with {ib-i,ib)- And 
finally, an edge is called a T^-edge if it is neither a T^-edge nor an innovation. Hence, 
observing ihij = aijl^\aij\<n-y} ' hj •= dijbij and using independence, the expectation 
can be rewritten as 

E{TT{Mp^nM*n)^lE„) = E' E" E"'®^(^*D2^*3*2 ' ' ' ^i2k-li2k^ili2k^Ej 

E Xy E ®^(®ni2®*3*2 ' ' ' ®n*2fc)®^(^U*2^*3*2 ' ' ' 

where E^ sums over all possible arrangements of the four types of the edges, is 
to count the total number of different canonical graphs given the arrangements of the 
four types of edges, and E runs through all graphs that are isomorphic to the given 
canonical graph. 

Let I be the number of T^-edges. Note I is also the number of innovations since 
every edge must be visited at least twice, and hence (2k — 21) is the number of Ti- 
edges. Let r be the number of row innovations. We see that E is bounded by 
E/=i Er=o (r) (z-r) (^VO- Since every row innovation (^ 23 - 2 , * 2 s-i) leads to a new 
vertex i 2 s-i £ {l)2 ,...,p} and every column innovation (i 2 s-i,i 2 s) leads to a new 
vertex i 2 s G {1, 2,... , re}, except for the hrst innovation (ii, ^ 2 ), which leads to both a 
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new vertex ii G {1,2,... p} and a new vertex i 2 G {1,2,... ,n}, then, on the event E^, 
there are at most p{Cpn^~^Y terms that have nonzero contributions to ■ 

Let q be the number of distinct r 4 -edges. It was shown in [28, page 519] that is 

bounded by , 

Finally, let b be the number of T 4 -edges among the q distinct ones that coincide 
with some innovations, and let ns,s = 1,2 ,... ,6, be the multiplicity of the T 4 -edges 
of the s-th such coincidence. Then, {q — b) distinct T 4 -edges do not coincide with any 
innovations but only among the T 4 -edges, and we denote hy mt,t = 1,2,... ,{q — b) the 
multiplicity of the t-th such coincidence. These numbers have to satisfy the relation 
2k — 21 = Yl^s=i + Z]t=i Hence, for such composition of four types of edges, 
we can write 


.2 \l-b 


q-b 


(Ea?,)- J](Ea^j+ 2 ) 


mt\ 
T1 ) 


s=l 

h 


<Lo(n^r n 


S = 1 

ns+ 2 >Q 


t=l 

Tig -\- 2 
rig+ 2 — a 


n 


7 (ns+ 2 -a) 


q-b 

n 

i=l 

mt>OL 


rrit 


mt — a 


n 


7 (mt—a) 


where Lq is a slowly varying function. Here, in the last inequality, we have used the 
following classic fact for the moments of truncated, heavy tailed random variables (see, 
e.g., [10, Proposition 1.5.8] or [7, Lemma A.8]), 


E|aii|*l 

\a-\_i\<x 


Lq{x), if s < a 

Lolx)jY^x^~°‘, if s > a 


Observing that (i) rig < 2k — 2, 2 < mt < 2k — 2, (ii) is bounded above, say, by 
some Ca, which only depends on a, for all m > a, and m G Z, and (hi) a > 2, we have 


E(ai,,20*3*2 • • • o*2.-i*2.0*1*2.) < Lo(n'^)''(2C„fc)%^E-i(-“+2-a)++E?zf(m*-a)+ 

< Lo(nT')«(2C„A:)%^E-H”“+2-2)++Enf("*t-2)+] 


where /■*“ = max(/, 0). After reorganizing and combining the terms, the expectation 
of the trace is then bounded by 


k I 


E(TV(Mp,„M*J^1eJ <pY.Y 1 


1=1 r=0 
^2k-2l 


k\ f k \ f2k — I 


I 


r) \l — r 




5=0 6=0 
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We now consider the terms inside the bracket. Firstly, for 7 > 0, 

n !, n^7(9+l) _ 1 ^ 27 ( 9 + 1 ) 

^ „27f> ^ ^ = 2 n 2 ^'?. 


b=0 


n27 - 1 


n‘^'r/2 


Next, we use the elementary inequality {q + 1)^ < (z/ log wy for any w > 1, z > 

0, g > 0. In what follows, we apply this inequality, substituting w = 2 and z = 6k — 61, 
and Lj(-)’s are all slowly varying functions. 

2k —21 q 

q=0 b=0 

2k-2l 

< ^ 2'?+i(6/t/log2)“-®^(Li(n^)A:3)5nT'(2fc-2^-29)n2T'9 


q=0 


2k-2l 


< 


2{6ni/^k/\og2f^-^^ ■ ^ (2Li(n^)A;3)^ 


q=0 


, Qk—Ql 


< (6nT'/3yt/log2)“-®^(L2(n^)A:3)2fc-2' < (^k^{n^ 

Next, using the combinatorial inequality that for any (i > 0 (see [28, Lemma 2.1]), 


k\ f k \ f 2k — I 


I 


r j \ l — r 




k\ f2V 

2r)' 


we get 


E(TV(Mp,„Af*„)UEj 


<M 1 + ^)“E ()) E (t"(«''L3(n7)‘/='A-‘/») 

^ ^ r=0 ^ ^ 

k 

< P(n)(l + V6f’^ ft) (1 + \^f{Cn>^y (fc 2 (n'^L 3 (nT'))i/ 3 < 5 - 1/6 


/=1 


6fc—6/ 


6 /c— 6 / 


(1 + '/6)‘^Cn^{l + 'sfpjnf' + 


< P{n) 


Since /i < 27 ', and for any 5 > 0, k = \rp ''\, 


k\rPU{rP)f/H-^/^X ~ n2(^+6V') = ^(n^V), 


we get ( 8 ). 


□ 


Remark 6 . All the proofs in this subsection (and further on) only used sparseness to 
determine the number of nonzero entries in each row and column. The exact location 
of the nonzero entries plays no role in the proof. Hence, all our results hold for a 
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larger class of sample covariance matrices including those constructed from banded 
rectangular matrices. 

3.4. Perturbations of eigenvalues and eigenvectors. The next two lemmas are 
classical tools of perturbation theory of eigenvalues. 

Theorem 3.5 (Cauchy interlacing theorem). Let \ < p < n. 

(a) Let An be an n x n Hermitian matrix and A^-i he its (n — 1) x (n — 1) minor, 

then Ai(^, 2 ) > Ai(^, 2 _i) > X 2 {An) ^ ^ Xn—i{An—i) ^ A^(j4„)/ 

(b) Let Ap^n be a p X n matrix and A(^p_i-^ n be its {p — 1) x n minor, then 

<7l(^p,n) ^ *^1 (^(p—l),n) — <^ 2 (^p,n) ^ ^ '^p —1 (^(p—l),n) — '7p(^p,n); 

(c) If p < n and Ap^n is a p x n matrix and Ap f^^-i) ^^s p x {n — 1) minor, then 

<^l(^p,n) ^ ^l(^p,(n— 1 )) — <^ 2 (^p,n) ^ ^ '^p —1 (^p,(n— 1 )) — ^p{-^p,n)j 

where in (b) and (c), (Ti{-) denotes the i-th largest singular value. 

Proof. See for instance [27, Lemma 22]. □ 

Theorem 3.6 (Perturbation of eigenvalues and eigenvectors). Let A be a Hermitian 
matrix and v be a unit vector. Let f = (v,^v) and e = ||(^ — C)v||. 

(a) There exists an eigenvalue Xe of A in the closed ball 

(b) If X^ is the only eigenvalue in B{f,e) with corresponding eigenvector Vf., and all 
other eigenvalues are at distance at least d> e of C, then jjve — Pv(ve)|| < 

Proof. A proof can be found in [11, page 77] or [7, Proposition A.l]. □ 


3.5. Convergence of ESD. In this subsection, we state the convergence of the cor¬ 
responding empirical spectral measures. We assume p > 0 for the next proposition. 


Proposition 3.7. Suppose a > 2, /r G (0,1] and x has variance one. Let he the 
sparse heavy tailed sample covariance matrix. Then the empirical spectral distribution 
of Tip^n/n^ converges almost surely to the Marchenko-Pastur law with density 


with A± = (1 ± y / p )"^■ 


yJ{X+ -X){X-X-) 

- 


Proof of Proposition 3. 7. The proof follows from the classic truncation and moment 
method for random matrices (See for instance [1, Exercise 2.1.18]). Normalizing the 
entries mij by gives the desired variance: 


= n ^Yax{mij) = n ^ ^ 


1 

n 


□ 


4. Proof of the main theorems 


4.1. Proof of Theorem 2.1. 
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4.1.1. Proof strategy. We will use the strategy that was first proposed by Soshnikov 
[26] when proving the heavy tailed Hermitian matrix case with 0 < a < 2. This idea 
was later developed in [2] for proving the Hermitian case and the sample covariance 
matrix case when 0 < a < 4 and used in [7] for proving the band Hermitian matrix 
case when a > 0. 

The strategy is as follows. We first show that the convergence holds when / = 1, i.e., 
1. Then, we remove the ii-th row from Mp^n- Lemma 3.1 guarantees that, 
with high probability, the second largest entry will not be removed. The convergence for 
the second largest eigenvalue and the second largest entry follows from Theorem 3.5 and 
the same argument for the I = 1 case. Iterating this process, one proves ^ 1 

for each I fixed. 


4.1.2. Eigenvalues. We begin by computing the two-sided tail probability of \mijf. 
For any t > 0, 

P(|mjj|^ > t) = P(|mjj| > Vt) = L{-\/t)n^~^ 

Since L{y/t) is also a slowly varying function in t, is a sparse heavy tailed 

random matrix of pxn independent entries with parameter p and a/2. Classic extreme 
value theory tells us that the random point process Qn, defined in (2), converges to 
the desired Poisson point process with intensity a/2x^~^°^/'^. In particular, 
converges to a Frechet distribution with parameter a/2. 


We next show that the largest eigenvalue of behaves like the square of the largest 
entry of Mp^n (^ = 1 case), i.e., 1- Since is positive semidefinite, 

Ai(Sp^„) > (Sp^riV,v) = v*Mp^nMp „^v for any unit vector v. Hence, we can choose 
V = ejj, which gives 


n n 

Ai(Sp^n) > ^^Mp^nMp^w = ^ -b ^ 

i=i j¥=ji,j=^ 

and it suffices to prove the reverse direction, i.e., Ve > 0, 

(9) P(Ai(Sp^„) > -b e)) —0, as n ^ oo. 

We use the infinity norm of T^p^n to bound Ai(Sp^„) and truncate the matrix Mp^n, 
when necessary. 

• Case /; 0 < a < I -b 

In this case, we can directly show (9). Observing that 


Ai(Sp,„) = ||Sp,„|| = J| < ||Mp,„||2 < ||Mp,, 

it suffices to show that with probability tending to one. 


MMp, 


n 1 
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The proof of (11) will be almost identical to (10), by switching the role of p and n. 
We hence show (10) only. Lemma 3.1 (a) says, with probability going to 1, there is no 
row that has two entries with absolute value greater than where k = + 5, and 

(5 > 0 can be chosen arbitrarily small. Consider the following summation and break it 
into three pieces. 


Si- ^ |mij|lpmij|<c«p} 




TTLij-j If /£_ ^ I I M I „ 


j=i 


{na+^<\mij\<c;^ } 


i=i i=i 

= *5i,l + Si^2 + <5i,3, 

where we choose p G (0, min{ , ^}). 

By Lemma 3.2 (a), w.e.h.p., for any e > 0, Si^i < which is o(n^). 

To see this, if a < 1, set e = 7?(1 — a) > 0, and w.e.h.p., 

<5i,i < = fia = o{n^). 

If 1 < a < 1 + then (1 — a)^ = 0 and hence Si^i < But a < 1 + pL~^ implies 


pL < so for e sufficiently small, = o{n'^c, ). 

By Lemma 3.2 (c), w.e.h.p., Si ^2 < Ve > p{a + 1). Since p < 




2a{a+l) ’ 


we can 


choose e = ^, and this gives w.e.h.p., Si ^2 < = 0 ( 77 , « as desired 


1 Q sl / 

Finally, since ^ + 6 <n<j + 6<l^ Cnp ~ n ~^, and by Lemma 3.2 (c), we can choose 


(/t + l)K 


< 7 < such that < n'^ = o(n"a ). Hence, w.e.h.p.. S'* = o(n"a ). 


M~l~l > 




The sum of absolute values in row i of Mp^n can be written as 


n 


Si .— ^ ^ — 


j=i i=i 

Moreover, a crude union bound and Lemma 3.1(b) give us 

P ( 3L max |mi,| > c!)„, 5* — max Imjd > ci~^° ) —)• 0 

V ^ ^<j<n j 

for some cq > 0 sufficiently small, which implies (10). Hence we have proved that 

^l(^p,n) P 


1 . 




nn I 


Next, we show that, with probability tending to one, has eigenvalues at + 

o(l)). We compute the /-th residual r^, for I > 1, i.e., 


(12) 
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and hence 


/ 


r/ = 


\ 


^ ^ ) • • • ) ^ ^ iRI-ijfcl ) • • • ) ^ ^ IT^pk^iik 


k=l 


k=l 


fe=l 


/ 


The norm of is o(c^p). To see this, we compute the ||r;|| explicitly: 




n 

2 

n 


ulb = 

E 


+ 




S = 1 

.k=l 


k=l 

k^jl 

/ 


< ^ \mi^k\ ^ ^ \mi^kY 


k=l 


s=l 

Si^H 


k=l 

k¥^jl 


( 


< \^iik\ Y Y + 


k=l s=l 

ki^jl 


s=l 

Si^H 




k=l 


X^T^h ) 


By Lemma 3.1 and 3.2 (a), (c) and (d), and a calculation similar to that when we 
bound the row sum of one can see that ||r;|| = o(c^p), with probability tending 

to one. Hence, letting = c~pYi, we have 


r~‘^y 

^np 


= c, 


-2 

np 


m. 


H3l I 


+ r 


where ||r^|| —)• 0. It then follows from Lemma 3.6 that has eigenvalues + 

o(l)). Hence, with probability tending to one, Xi{'Sp^n) > + o(l))- 

To show that these are exactly the largest eigenvalues (where the case / = 1 is proved), 
we use Theorem 3.5. When I = 2, let Mp^n-h be the submatrix of Mp^n removing 
the M-th row and let := Mp^n-i^M*^ By Lemma 3.1(a), with probability 

going to one, the second largest entry of Mp^n (in absolute value), rrii^j^, will remain in 

U \ 

Mp^n,-h- Using the infinity norm bound on Tip^n and the same argument as we prove 
for Ai(Sp^„), we have, with probability tending to one, 


A2(Sp,„) < Ai(Sg),)) = \mi,j,\\l + oil)), 


where the first inequality is due to the interlacing of eigenvalues. The claim for general 
Xi{T,p^n) then follows from iterating the above argument. 

• Case II: 1 + < a < 2(1 + /i“^). 

For this case, in order to show (9), we choose 7 , 7 ' > 0 such that 


0 <^- 


a a{a — 1 ) 


< 7 < 


/X + 1 


a 


max (^ 7 , 0 


< 7 ' < 


/X + 1 
a 


which is always possible ifl + /x ^<a:< 2(1 + ^ ^). We truncate the entries of Mp^n 
at rC. Let Mp^n ■= = [mjl{\mij\<n'y}]fj=i, and = Mp^n - Mp,n be the 

truncated part and the remaining part of Mp^^: respectively. 
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We decompose as below: 

= (Mp,n + 

= J + {M^^nKn + + KnKn) 

•= + ^p,n- 

Using triangular inequality, we have 

Ai(Sp,J = ||S„,p|| < ||Mp,„ + m;J| 2 < [||Mp,„|| + i|m;j|]2 

<I|Ep,„II + 2||Sp,„I|V2(||m;jIiIIm;jIoo)'/2 + I|m;j|i||m;^^ 

Hence, we will prove, with probability tending to one, 

(13) ||Sp^„|| = o(c„p) 

(14) ||M^^„l|oo < + o(l)), ||M^_„||i < |mqjJ(l+ o(l)) 

which gives (9). For (13), first, one can deduce from the lower bound of 7 that /U + 
7(1 — a) < 1 ^, and hence, 

||Mp,„|| - ||Mp,„ - ErhijW < y/^^rhij < CL(n'^)n^+'^(^““^ = o(n^) = o(c„p). 

So we may assume that the truncated entries are centered. Here, the first inequality 
is a consequence of [3, Theorem A.46] and the second inequality is due to [2, Lemma 
13]. Theorem 3.4 indicates that 

P(llEp,„ll >C'n 2 ^')^ 0 , 

Now with 7' chosen such that 7^ < (13) holds with probability tending to one. 

For (14), again, we only show the upper bound for the infinity norm. As in the previous 
case, it is enough to show that for any \ <i <n hxed, w.e.h.p.. 


l^b|l{m'<|mij|<c«p} — o{n a ) 


i=i 


We treat Si similarly: 


s< = E 


rriijW^ 


i=i 

n 


{'nH <\mij\<n ct 


+y^l™'b|l-r 

OL ^ \ ‘ ^ in. a. 


{n a ’^<\mij\<n a +'*} 


~t“ ^ ^ ] Tfii 7 j 1 ^ M+l 


i=i 


{n a +’’<|mi 7 l<c« } 


1=1 

•= Si i + Si 2 + 3 . 


Here, the only difference from the previous case is the S'jq term. By Lemma 3.2 (b), 

for arbitrarily small 5 > 0 , w.e.h.p., 5*7 < n^“T(“-i)+'5_ choice of 7 > L — 

~ 1 , 
guarantees Sjq = o[n~^). 

The part applying Cauchy interlacing theorem is identical to Case I, so we remain to 
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show that with probability going to one, the norm of r;, as defined in (12) is of smaller 
order with respect to c^p, as n —)• oo. We estimate ||ri|| using the triangular inequality 
and the decomposition of Sp^„ above: 


rq 


< ||Sp,„|| + ||M;„Mp>,J| + ||Mp,„M;>J| + - 

^ I I I I OlIV' lll/2/ll7i/r/ II Il7i.r/ II \1/2 I IIti/t/ h /t/* 


\mi 


< IIS. 


In view of (13) and (14), we remain to show that with probability going to one, 

(15) \ = o{clp). 


We compute the left side directly, which yields 


\\^p,n^^p,n^il 11 — 1^*1^ I l-{|mijfc|>m'} | l{|m,sfc |>nT'} 


fc=l 


s=l 


( 




S=1 




{\mi k\>n"<} 


1 k=l 




Using Lemma 3.1 and 3.2 (b), (c) and (d), we see that each summation above is 
o{cnp) with probability tending to one and hence (15) is proved. We conclude that 
||r;|| = o(c^p) with probability tending to one and the proof is complete. 


4.1.3. Eigenvectors. 

We consider the matrix c~pT.p^n- Let v = and C = (v, c“p Sp^„v) = c~p 
Then 

^ 11 (Lip,n — C)v| I < Cj^p I |(Sp^n — I )6ii II + C,.,^p II (C ~ I™-*;!; I )®*i III 

n 

^ -2ii II , -2 I |2 

< Cnpllr^ll+c„p 2^ . 

j=^J¥=ji 

p 

By Lemma 3.1 and 3.2, we know e ^ 0. Hence, in order to use Theorem 3.6, it suffices 
to show that for sufficiently small d > 0, with probability tending to one, Xi is the only 
eigenvalue in i.e., each A: > 1, the spacing of the eigenvalues satisfies 

hmlimsupP - Afc+i(Sp,0 ^ q 

n^oo \ ^np / 

Since we have proved XkiE^p^n)/\'fnikjk\‘^ ^ 1) iLis is equivalent to 

lim lim sup P (c;(p21 mi.jJ 2 < 5 ) = q. 

n—>-oo 

However, this follows from the fact that Qn = 5.-2 1 ^ 12 

—1 — J- Cjip\TTlij\ 

Poisson point process on (0,+ 00 ). 


converges to a 
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4.2. Proof of Theorem 2.2. 


4.2.1. Eigenvalues. Proposition 3.7 implies that for any k > 1 fixed, and any e > 0 

P(Afc(Ilp,n) > (1 + ^ Ij as n ^ oo. 

It remains to prove the upper bound, i.e., P(Afc(Sp^„) < (1 + + e)n^) 1. 

Since Afc(Sp^„) < Ai(Sp^„) and since we can decompose in the same way as in the 
previous case, it is enough to show that 


|Sp,n||+ 2 ||S. 


'p,n| 


||V2(||m'J liIlM'J| o,)V2+||m'J liIlM'JU > (1 +Vp)'(1+6K 


goes to zero. In this regime, we choose 7 ' = fj,/2 and 7 G ( 2 ( 0 ^ 1 ) ’ 2 )’ which is always 
possible when a > 2. Such 7 and 7 ' satisfy the assumptions in Theorem 3.4, which 
gives the bound for the truncated part, i.e., 

P(||^p,nll > (1 + y/p)‘^{^ + e)u^) —)• 0. 

We remain to show that ||Mp^^||i = o{n^^‘^) and ||Mp^^l|oo = o(n^/^) with probability 
tending to one. Again, we prove for the infinity norm only, i.e., with probability going 
to one, 

n 

Si := ^ = o(n^/^), for all 1 < z < n. 

i=i 

Since c^p ~ and c“p|mj^jj 2 converges in distribution, then for any 9 > 1^, 

with probability tending to one, maxi<jj<„ \mij\ < n®. Hence, with probability tend¬ 
ing to one, for all 1 < z < n. 


n 

W-iillr , , A, + > l?TZii|l 


s. = E 

i=i 

•= <5i,i + Si^2 + <5i,3. 


i=i 


{n OL <,\mij\<n ol } 


M+l ^ ^ I '^ij I 1 M+l 

o. \ f ^ /n a 




{n oi <\mij\<n^} 


By Lemma 3.2 (b), (c) and (d), respectively, for any e > 0, we have w.e.h.p.. 


S,,i < n/^-7(«-i)+^, < nV+^ Si ,3 < 


As a > 2(1 + we have We can choose 6 arbitrarily close to 

and e > 0 small enough such that w.e.h.p.. Si ,2 + Si ,3 = Moreover, the 

choice of 7 > 2 (a-i) k- ~ “ 1 ) < 2 - making e small enough, we get 

w.e.h.p., Si,i = o(n^/^). Therefore, with probability tending to one, for all 1 < z < n, 
Si = o{n^/‘^) hence ||Afp,j||oo = o(n^/^). The proof is complete. 
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4.2.2. Eigenvectors. Last, we prove the localization result of the eigenvectors of 
We use the following simple linear algebra lemma [7, Lemma 4.2] that we quote without 
proof. 


Lemma 4.1. Let H be a Hermitian matrix and Pl{H) be the maximum spectral radius 
of its L X L principal sub-matrix. Let A be an eigenvalue of H and v an associated 
unit eigenvector. If w is {L^g)-localized, then 

Pl{H) + ^\\H\\ 

\/l -V 



If a > 2{l-\-p~^) then, by Theorem 3.4, we know that ||Sp^„|| is of order (l + ^/p)^n^. 
In view of Lemma 4.1, it suffices to show that there exists g > 0 such that with 
probability going to one 

P[pl^\ i^P,n) < (\/l -f? - \A?)(1 + 

In other words, we must establish that there exist e > 0 such that with probability 
going to one any x principal sub-matrix W of satisfies ||W|| < (l + e)n^. 

We proceed as follows. A principal sub-matrix W is obtained by choosing [p^J 
rows of the rectangular matrix and writing W = MjMf, where 

I = {zi,..., }. Here, the notation Mj stands for the [p^J x n sub-matrix of 

M formed by the rows with indices in I. As before, we write M and M' for the 
truncation and remainder of the matrix M at level n'^, for 7 E ( 2(0^1) ’ 2)' Then 
||w|| < ||M/m;|| + 2||M7m;||V2(||m;i|i||m;i|oo)^/2 + ||m 

For any choice of I, ||MjMj*|| < jjAfp,j||i||Mp„||oo = o{n^) as in the proof in Section 
4.2.1. On the other hand, for any choice of I, one can adapt the proof of Theorem 3.4 
to deal with the case of p = [p^J rows to show that for any 1 < c < (1 -|- there 

exists 0 = 0{c) >0 and 7' = g'{c) > 0 so that 

P {\\MiM*j\\ > 


where P is a polynomial in re. Indeed, in the case where p —>■ 00 and p/re —>■ 0 one needs 
to control the appearance of odd and even innovations (or odd/even marked vertices 
as in [21, Section 2.2]). We leave the details to the reader. Since there are at most 
re^^ ways to choose the indices in I, the probability of the existence of such a principal 
sub-matrix is bounded above by: 

P(re)re-®Lr^TJ^(2pnp^ 

Thus if we choose /3 < 7 ', we obtain the desired result. 


5. Hermitian Matrices 

In the last section, we derive one extension of the methods of Sections 3 and 4. 
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5.1. Sparse Hermitian matrices with heavy tails. Recall that x is a random 
variable with heavy tailed distribution and y is a Bernoulli random variable, indepen¬ 
dent of X, with success probability n^~^. Let Xn = [xij]2j be an n x n Hermitian, 
random matrix where entries along and above the diagonal are i.i.d. copies of x and 
Yn = be a real, symmetric matrix whose entries along and above the diagonal 

are i.i.d. copies of y. Define 


Xtn — Xn ■ Yn — j 

Since there are, on average, independent, nonzero entries in Mn, the 

right scaling factor for the largest entries of should be 

Using the a similar argument as in Sections 3 and 4, one can prove the different 
behavior for the eigenvalues and eigenvectors of Mn, determined by the tail exponent 
a and the sparsity exponent y,. When 0 < a < 2{1 + y~^), the largest eigenvalues of 

/x-fl y, fj, 

Mn behave like its largest entries. In this case, n~^ S> nz, where is the order of 
the bulk of the spectrum. The eigenvectors are localized. 


1 si. <^+1 

> ~ n a . 


Theorem 5.1. S'uppose 0 < a < 2(1 -|- ^).For{l + y ^)<a<2{l + y we also 
assume that x is centered. Then for each I > 1, as n ^ oo, we have 

M{Mn) p^ ^ 

I 

and the localization of the corresponding eigenvector, i.e., 



^l{Mn) 


^ miei{Mr,)l2 

v^ 


e*, + e 


■Z0;(Af7i)/2 

^3l 


2 



0 . 


Again, (16) implies that the random point process (Af„)^Ai(M„)>o ^Iso 

converges in distribution to a Poisson point process on (0, -|-oo) with intensity as 
n —oo. 

When a> 2{1 + y~^), the analogue of Theorem 2.2 is the following. 

Theorem 5.2. Suppose a > 2(1 -|- y~^) and that x has mean zero and variance one. 
Then for each I >1, as n ^ oo, we have 

M{Mn) 2 

nul‘^ 

The eigenvectors are delocalized: there exist f3 > 0 such that for each I > 1, rj < 1/2 

P (31 : |A;(Mn)| > -y^||M„|| and vi{Mn) is {[n^\,ri)-localized^ —)• 0 
as n ^ 00 . 
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Under the extra assumptions that the matrix has, asymptotically, a fixed number 
nonzero entries in almost all rows and no randomness in their positions. Theorems 
5.1 and 5.2 were proved in [7]. It is not difficult to modify the arguments there to 
include the above results. For instance, Lemmas 3.1 and 3.2 still hold if we replace 
Mp^n by the Hermitian matrix Mn and Cnp by c^, and this modification is well suited 
for deriving an upper bound for the infinity norms. The next proposition is a small 
modification of [7, Theorem 2.1], which allows us to deal with the fact that here the 
number of nonzero elements in a row is random and not bounded by n^. 

Proposition 5.3. Suppose a > 2 and x has mean zero and variance one. Let = 
Xn-Y-n be the sparse nxn Hermitian matrix with heavy tailed entries. Consider positive 
exponents 7 , 7 ' and 7 " such that ^ < 7 ' and 7 + 7 + 7 ^^ < ■ Define the truncated 

matrix 

Mn = 

We also assume that the truncated entries are centered. Then for Sn < n'^ , there 
exists a slowly varying function Lq such that for any C > 0 

E[TV(Mf")l{L<CnM)] < Lo(n)n^+2^s-3/2(2nT'')2^T 

where L := maxi<i<n X]j=i l{|mij|>o} = maxi<i<n 2/jj, i.e., the maximum of the 
number of nonzero entries of all rows. 

Proof. First, note that it is equivalent to truncate the Xn matrix, i.e., 

Mn [uijj]j ' yij]ij=l ■ [^ijyij]i,j=l- 

By independence, we write the expected trace on the event {L < Cn^} 

n 

hr-->Lsn=l 

n 

^ ^ • • • Xj2s„n''' 2/*2s„*ib{L<Cn^‘})- 

h,...,*2a„=l 

Once this factorization is written, the proof follows immediately from the same com¬ 
binatorics presented in the proof of [7, Theorem 2.1]. Note 1 / 7^2 ''' l{L<CnM} is 
nonzero only if all rows have no more than Cn^ nonzero entries. When labeling the 
vertices in a path i = (U, • • • U 2 s„)) we have Cn^ possible choices for each vertex in¬ 
stead of in [7, Theorem 2.1]. However, the extra factor C will not play a role in 
determining the upper bound. We omit the lengthy calculation here. □ 
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